Overview

Dataset statistics

Number of variables10
Number of observations737
Missing cells417
Missing cells (%)5.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory57.7 KiB
Average record size in memory80.2 B

Variable types

Numeric9
Categorical1

Warnings

Pregnancies is highly correlated with AgeHigh correlation
Glucose is highly correlated with InsulinHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
Insulin is highly correlated with GlucoseHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
Glucose is highly correlated with InsulinHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
Insulin is highly correlated with GlucoseHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
DiabetesPedigreeFunction is highly correlated with BMIHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
Glucose is highly correlated with Insulin and 1 other fieldsHigh correlation
Insulin is highly correlated with GlucoseHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
BMI is highly correlated with DiabetesPedigreeFunction and 1 other fieldsHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Outcome is highly correlated with GlucoseHigh correlation
BloodPressure has 47 (6.4%) missing values Missing
Insulin has 356 (48.3%) missing values Missing
Age has 14 (1.9%) missing values Missing
df_index has unique values Unique
Pregnancies has 108 (14.7%) zeros Zeros

Reproduction

Analysis started2021-05-12 16:38:24.112812
Analysis finished2021-05-12 16:38:37.746811
Duration13.63 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct737
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean389.7367707
Minimum0
Maximum767
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2021-05-12T12:38:37.824603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile37.8
Q1206
median393
Q3581
95-th percentile730.2
Maximum767
Range767
Interquartile range (IQR)375

Descriptive statistics

Standard deviation221.277169
Coefficient of variation (CV)0.5677605646
Kurtosis-1.176693528
Mean389.7367707
Median Absolute Deviation (MAD)188
Skewness-0.04454773998
Sum287236
Variance48963.58551
MonotonicityStrictly increasing
2021-05-12T12:38:37.938299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.1%
5241
 
0.1%
5141
 
0.1%
5151
 
0.1%
5161
 
0.1%
5171
 
0.1%
5181
 
0.1%
5191
 
0.1%
5201
 
0.1%
5211
 
0.1%
Other values (727)727
98.6%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
101
0.1%
ValueCountFrequency (%)
7671
0.1%
7661
0.1%
7651
0.1%
7641
0.1%
7631
0.1%
7621
0.1%
7611
0.1%
7601
0.1%
7591
0.1%
7581
0.1%

Pregnancies
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct16
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.829036635
Minimum0
Maximum15
Zeros108
Zeros (%)14.7%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2021-05-12T12:38:38.050278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum15
Range15
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.350547209
Coefficient of variation (CV)0.8750366027
Kurtosis0.007999414434
Mean3.829036635
Median Absolute Deviation (MAD)2
Skewness0.8736186517
Sum2822
Variance11.2261666
MonotonicityNot monotonic
2021-05-12T12:38:38.133089image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
1129
17.5%
0108
14.7%
297
13.2%
374
10.0%
466
9.0%
554
7.3%
648
 
6.5%
743
 
5.8%
836
 
4.9%
926
 
3.5%
Other values (6)56
7.6%
ValueCountFrequency (%)
0108
14.7%
1129
17.5%
297
13.2%
374
10.0%
466
9.0%
554
7.3%
648
 
6.5%
743
 
5.8%
836
 
4.9%
926
 
3.5%
ValueCountFrequency (%)
151
 
0.1%
142
 
0.3%
1310
 
1.4%
129
 
1.2%
1111
 
1.5%
1023
3.1%
926
3.5%
836
4.9%
743
5.8%
648
6.5%

Glucose
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct135
Distinct (%)18.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean121.8331072
Minimum44
Maximum199
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2021-05-12T12:38:38.240803image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile80
Q1100
median117
Q3141
95-th percentile181
Maximum199
Range155
Interquartile range (IQR)41

Descriptive statistics

Standard deviation30.50980937
Coefficient of variation (CV)0.2504229767
Kurtosis-0.2725405002
Mean121.8331072
Median Absolute Deviation (MAD)20
Skewness0.5424779303
Sum89791
Variance930.8484676
MonotonicityNot monotonic
2021-05-12T12:38:38.348542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10017
 
2.3%
9916
 
2.2%
11114
 
1.9%
12914
 
1.9%
10813
 
1.8%
10613
 
1.8%
12513
 
1.8%
9513
 
1.8%
11213
 
1.8%
12212
 
1.6%
Other values (125)599
81.3%
ValueCountFrequency (%)
441
 
0.1%
561
 
0.1%
571
 
0.1%
611
 
0.1%
621
 
0.1%
651
 
0.1%
671
 
0.1%
683
0.4%
714
0.5%
721
 
0.1%
ValueCountFrequency (%)
1991
 
0.1%
1981
 
0.1%
1974
0.5%
1963
0.4%
1952
0.3%
1943
0.4%
1932
0.3%
1911
 
0.1%
1901
 
0.1%
1894
0.5%

BloodPressure
Real number (ℝ≥0)

MISSING

Distinct46
Distinct (%)6.7%
Missing47
Missing (%)6.4%
Infinite0
Infinite (%)0.0%
Mean72.26521739
Minimum24
Maximum122
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2021-05-12T12:38:38.470217image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile52
Q164
median72
Q380
95-th percentile92
Maximum122
Range98
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.42022236
Coefficient of variation (CV)0.1718699923
Kurtosis0.9051929048
Mean72.26521739
Median Absolute Deviation (MAD)8
Skewness0.1293096436
Sum49863
Variance154.2619234
MonotonicityNot monotonic
2021-05-12T12:38:38.580025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
7055
 
7.5%
7447
 
6.4%
7242
 
5.7%
6842
 
5.7%
7841
 
5.6%
7639
 
5.3%
6439
 
5.3%
6036
 
4.9%
8036
 
4.9%
6234
 
4.6%
Other values (36)279
37.9%
(Missing)47
 
6.4%
ValueCountFrequency (%)
241
 
0.1%
302
 
0.3%
381
 
0.1%
401
 
0.1%
444
 
0.5%
462
 
0.3%
484
 
0.5%
5013
1.8%
5210
1.4%
5411
1.5%
ValueCountFrequency (%)
1221
 
0.1%
1141
 
0.1%
1102
0.3%
1082
0.3%
1063
0.4%
1042
0.3%
1021
 
0.1%
1003
0.4%
983
0.4%
963
0.4%

SkinThickness
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct51
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.09596929
Minimum7
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2021-05-12T12:38:38.689730image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile14
Q125
median29.09596929
Q332
95-th percentile44
Maximum99
Range92
Interquartile range (IQR)7

Descriptive statistics

Standard deviation8.814606327
Coefficient of variation (CV)0.3029493961
Kurtosis5.607054474
Mean29.09596929
Median Absolute Deviation (MAD)3.90403071
Skewness0.8473838871
Sum21443.72937
Variance77.69728469
MonotonicityNot monotonic
2021-05-12T12:38:38.910141image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29.09596929216
29.3%
3229
 
3.9%
3027
 
3.7%
2723
 
3.1%
3320
 
2.7%
2320
 
2.7%
1820
 
2.7%
2819
 
2.6%
3119
 
2.6%
3918
 
2.4%
Other values (41)326
44.2%
ValueCountFrequency (%)
72
 
0.3%
82
 
0.3%
105
 
0.7%
116
0.8%
127
0.9%
1311
1.5%
146
0.8%
1513
1.8%
166
0.8%
1713
1.8%
ValueCountFrequency (%)
991
 
0.1%
631
 
0.1%
601
 
0.1%
561
 
0.1%
542
0.3%
522
0.3%
511
 
0.1%
502
0.3%
493
0.4%
484
0.5%

Insulin
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct180
Distinct (%)47.2%
Missing356
Missing (%)48.3%
Infinite0
Infinite (%)0.0%
Mean156.2650919
Minimum14
Maximum846
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2021-05-12T12:38:39.020874image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile43
Q177
median126
Q3190
95-th percentile392
Maximum846
Range832
Interquartile range (IQR)113

Descriptive statistics

Standard deviation118.8441951
Coefficient of variation (CV)0.7605293904
Kurtosis6.452059587
Mean156.2650919
Median Absolute Deviation (MAD)54
Skewness2.173246634
Sum59537
Variance14123.9427
MonotonicityNot monotonic
2021-05-12T12:38:39.126306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10511
 
1.5%
1309
 
1.2%
1409
 
1.2%
1208
 
1.1%
1007
 
0.9%
1807
 
0.9%
1106
 
0.8%
1156
 
0.8%
946
 
0.8%
665
 
0.7%
Other values (170)307
41.7%
(Missing)356
48.3%
ValueCountFrequency (%)
141
 
0.1%
151
 
0.1%
161
 
0.1%
182
0.3%
221
 
0.1%
231
 
0.1%
251
 
0.1%
291
 
0.1%
363
0.4%
372
0.3%
ValueCountFrequency (%)
8461
0.1%
7441
0.1%
6801
0.1%
6001
0.1%
5791
0.1%
5451
0.1%
5431
0.1%
5401
0.1%
5101
0.1%
4952
0.3%

BMI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct245
Distinct (%)33.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.42890095
Minimum18.2
Maximum67.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2021-05-12T12:38:39.239006image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum18.2
5-th percentile22.2
Q127.5
median32.3
Q336.6
95-th percentile44.26
Maximum67.1
Range48.9
Interquartile range (IQR)9.1

Descriptive statistics

Standard deviation6.902444669
Coefficient of variation (CV)0.2128485538
Kurtosis0.9148683966
Mean32.42890095
Median Absolute Deviation (MAD)4.6
Skewness0.5936784499
Sum23900.1
Variance47.6437424
MonotonicityNot monotonic
2021-05-12T12:38:39.342729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31.212
 
1.6%
3212
 
1.6%
31.612
 
1.6%
33.310
 
1.4%
32.410
 
1.4%
30.19
 
1.2%
32.99
 
1.2%
30.89
 
1.2%
33.68
 
1.1%
32.88
 
1.1%
Other values (235)638
86.6%
ValueCountFrequency (%)
18.23
0.4%
18.41
 
0.1%
19.11
 
0.1%
19.31
 
0.1%
19.41
 
0.1%
19.52
0.3%
19.63
0.4%
19.91
 
0.1%
201
 
0.1%
20.11
 
0.1%
ValueCountFrequency (%)
67.11
0.1%
59.41
0.1%
57.31
0.1%
551
0.1%
53.21
0.1%
52.91
0.1%
52.32
0.3%
49.71
0.1%
49.61
0.1%
49.31
0.1%

DiabetesPedigreeFunction
Real number (ℝ≥0)

HIGH CORRELATION

Distinct503
Distinct (%)68.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4721085482
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2021-05-12T12:38:39.447449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.1418
Q10.245
median0.376
Q30.624
95-th percentile1.1166
Maximum2.42
Range2.342
Interquartile range (IQR)0.379

Descriptive statistics

Standard deviation0.3287749935
Coefficient of variation (CV)0.6963970358
Kurtosis5.805183303
Mean0.4721085482
Median Absolute Deviation (MAD)0.169
Skewness1.9341047
Sum347.944
Variance0.1080929964
MonotonicityNot monotonic
2021-05-12T12:38:39.561145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2586
 
0.8%
0.2546
 
0.8%
0.2595
 
0.7%
0.2075
 
0.7%
0.2385
 
0.7%
0.2685
 
0.7%
0.194
 
0.5%
0.274
 
0.5%
0.2844
 
0.5%
0.264
 
0.5%
Other values (493)689
93.5%
ValueCountFrequency (%)
0.0781
0.1%
0.0841
0.1%
0.0851
0.1%
0.0882
0.3%
0.0891
0.1%
0.0921
0.1%
0.11
0.1%
0.1011
0.1%
0.1071
0.1%
0.1081
0.1%
ValueCountFrequency (%)
2.421
0.1%
2.3291
0.1%
2.2881
0.1%
2.1371
0.1%
1.8931
0.1%
1.7811
0.1%
1.6991
0.1%
1.6981
0.1%
1.61
0.1%
1.4761
0.1%

Age
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct52
Distinct (%)7.2%
Missing14
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean33.32780083
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2021-05-12T12:38:39.671849image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.74423364
Coefficient of variation (CV)0.3523854965
Kurtosis0.6064216252
Mean33.32780083
Median Absolute Deviation (MAD)7
Skewness1.118982471
Sum24096
Variance137.9270238
MonotonicityNot monotonic
2021-05-12T12:38:39.776569image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2264
 
8.7%
2157
 
7.7%
2547
 
6.4%
2444
 
6.0%
2335
 
4.7%
2834
 
4.6%
2732
 
4.3%
2631
 
4.2%
2928
 
3.8%
3123
 
3.1%
Other values (42)328
44.5%
ValueCountFrequency (%)
2157
7.7%
2264
8.7%
2335
4.7%
2444
6.0%
2547
6.4%
2631
4.2%
2732
4.3%
2834
4.6%
2928
3.8%
3019
 
2.6%
ValueCountFrequency (%)
811
 
0.1%
721
 
0.1%
701
 
0.1%
691
 
0.1%
681
 
0.1%
673
0.4%
664
0.5%
652
0.3%
641
 
0.1%
634
0.5%

Outcome
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0
477 
1
260 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters737
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0477
64.7%
1260
35.3%

Length

2021-05-12T12:38:39.979027image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-12T12:38:40.043854image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0477
64.7%
1260
35.3%

Most occurring characters

ValueCountFrequency (%)
0477
64.7%
1260
35.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number737
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0477
64.7%
1260
35.3%

Most occurring scripts

ValueCountFrequency (%)
Common737
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0477
64.7%
1260
35.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII737
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0477
64.7%
1260
35.3%

Interactions

2021-05-12T12:38:28.146720image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:28.290336image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:28.416997image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:28.548657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:28.667845image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:28.783536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:28.891247image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:29.004971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:29.125620image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:29.246299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:29.368998image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:29.488685image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:29.611350image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:29.727012image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:29.838713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:29.942436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:30.053140image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:30.169828image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:30.287513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:30.419161image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:30.544825image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:30.675476image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:30.794158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:30.910846image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:31.014569image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:31.130259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:31.320750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:31.443422image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:31.560168image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:31.671868image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:31.787532image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:31.893248image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:31.995013image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:32.082772image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:32.182506image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:32.290187image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:32.397927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:32.510626image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:32.619307image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:32.732035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:32.831769image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:32.933496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.027245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.123990image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.226714image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.331469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.435192image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.533900image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.632663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.721398image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.813180image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.899949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:33.987714image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.081468image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.176184image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.287885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.395597image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.507298image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.608029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.704770image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.792536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.888279image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:34.991005image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:35.183491image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:35.302172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:35.417863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:35.537543image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:35.644269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:35.748989image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:35.846236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:35.949958image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:36.059664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:36.170368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:36.291046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:36.406736image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:36.527442image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:36.634157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:36.737879image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:36.832626image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:36.935323image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-05-12T12:38:37.045057image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-05-12T12:38:40.107683image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-12T12:38:40.281248image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-12T12:38:40.454782image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-12T12:38:40.629288image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-05-12T12:38:37.223149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-12T12:38:37.431654image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-05-12T12:38:37.568289image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-05-12T12:38:37.655056image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexPregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
006148.072.035.000000NaN33.60.62750.01
11185.066.029.000000NaN26.60.35131.00
228183.064.029.095969NaN23.30.67232.01
33189.066.023.00000094.028.10.16721.00
440137.040.035.000000168.043.12.28833.01
555116.074.029.095969NaN25.60.20130.00
66378.050.032.00000088.031.00.24826.01
7710115.0NaN29.095969NaN35.30.13429.00
882197.070.045.000000543.030.50.15853.01
9104110.092.029.095969NaN37.60.19130.00

Last rows

df_indexPregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
7277581106.076.029.095969NaN37.50.19726.00
7287596190.092.029.095969NaN35.50.27866.01
729760288.058.026.00000016.028.40.76622.00
7307619170.074.031.000000NaN44.00.40343.01
731762989.062.029.095969NaN22.50.14233.00
73276310101.076.048.000000180.032.90.17163.00
7337642122.070.027.000000NaN36.80.34027.00
7347655121.072.023.000000112.026.20.24530.00
7357661126.060.029.095969NaN30.10.34947.01
736767193.070.031.000000NaN30.40.31523.00